Client Report - What’s in a Name?

Course DS 250

Author

Dylan Kohn

Show the code
import pandas as pd
import numpy as np
from lets_plot import *

LetsPlot.setup_html(isolated_frame=True)

Project Notes

For Project 1 the answer to each question should include a chart and a written response. The years labels on your charts should not include a comma. At least two of your charts must include reference marks.

Show the code
# Learn morea about Code Cells: https://quarto.org/docs/reference/cells/cells-jupyter.html

# Include and execute your code here
df = pd.read_csv("https://raw.githubusercontent.com/byuidatascience/data4names/master/data-raw/names_year/names_year.csv")

QUESTION|TASK 1

How does your name at your birth year compare to its use historically?

The name Dylan became popular in the 1990s and then starting experiencing a drop off at the year 2000 and as of present is starting to stabalize.

Show the code
# Include and execute your code here
myName = "Dylan"
birthYear = 2004

df_name = df[df["name"] == myName]
birth_year_data = df_name[df_name["year"] == birthYear]

p = (ggplot(df_name) +
     geom_line(aes(x='year', y='Total'), color="blue") +
     geom_point(aes(x='year', y='Total'), data=birth_year_data, color="red", size=5) +
     ggtitle(f"Popularity of the Name '{myName}' Over Time") +
     xlab("Year") + ylab(f"Number of Babies Named {myName}") +
     ggsize(400, 200))

p.show()

QUESTION|TASK 2

If you talked to someone named Brittany on the phone, what is your guess of his or her age? What ages would you not guess?

The name was very popular in the 1990s for a short time then quickly died out again.

Show the code
# Include and execute your code here
test_name = "Brittany"
df_name = df[df["name"] == test_name]
max_point = df_name[df_name["Total"] == df_name["Total"].max()]
min_point = df_name[df_name["Total"] == df_name["Total"].min()]
max_year = df_name[df_name["Total"] == df_name["Total"].max()]["year"].values[0]
min_year = df_name[df_name["Total"] == df_name["Total"].min()]["year"].values[0]

p = (ggplot(df_name) +
     geom_line(aes(x='year', y='Total'), color="blue") +
     geom_point(aes(x='year', y='Total'), data=max_point, color="green", size=5) +
     geom_point(aes(x='year', y='Total'), data=min_point, color="red", size=5) +
     ggtitle(f"Popularity of the Name '{test_name}' Over Time") +
     xlab("Year") + ylab(f"Number of Babies Named {test_name}") +
     ggsize(400, 200))
  
p.show()
print(f"Most likely {max_year}, least likely {min_year}.")
Most likely 1990, least likely 1968.

QUESTION|TASK 3

Mary, Martha, Peter, and Paul are all Christian names. From 1920 - 2000, compare the name usage of each of the four names in a single chart. What trends do you notice?

After the year 1975 the use of these names significantly decreased.

Show the code
# Include and execute your code here

# Array to store names
names = ["Mary", "Martha", "Peter", "Paul"]

# Filter the dataframe
df_filtered = df.query('name in @names and year >= 1920 and year <= 2000')

# Set up the graph
LetsPlot.setup_html()
p = (ggplot(data=df_filtered) +
     geom_line(aes(x='year', y='Total', color='name')) +
     ggtitle("Usage of Christian Names (Mary, Martha, Peter, Paul) Over Time") +
     xlab("Year") +
     ylab("Number of Babies Named") +
     ggsize(400, 200))

p.show()

QUESTION|TASK 4

Think of a unique name from a famous movie. Plot the usage of that name and see how changes line up with the movie release. Does it look like the movie had an effect on usage?

The use of the name Maverick slightly went up after the release of the movie but went up significantly more after the sequel Top Gun: Maverick was released

Show the code
# Include and execute your code here

# Define variables
movie_name = "Maverick"
movie_release_year = 1986

# Filter the dataframe using the query function
df_filtered = df.query('name == @movie_name')

# Set up the graph
LetsPlot.setup_html()
p = (ggplot(data=df_filtered) +
     geom_line(aes(x='year', y='Total'), color='blue') +
     geom_vline(xintercept=movie_release_year, linetype="dashed", color="red") +
     ggtitle(f"Usage of the Name '{movie_name}' Over Time") +
     xlab("Year") +
     ylab(f"Number of Babies Named {movie_name}") +
     ggsize(400, 200))

p.show()

STRETCH QUESTION|TASK 1

Reproduce the chart Elliot using the data from the names_year.csv file.

type your results and analysis here

Show the code
# Include and execute your code here

# Set the variables and define the years we want to look at
name = "Elliot"
release_years = [1982, 1985, 2002]
release_labels = ["E.T Released", "Second Release", "Third Release"]

# Filter the dataframe using the query function
df_filtered = df.query('name == @name and year >= 1950')

# Set up the graph
LetsPlot.setup_html()

# Add the annotation marks
annotations = pd.DataFrame({
    'year': release_years,
    'label': release_labels,
    'y': [df_filtered["Total"].max() * 0.9] * len(release_years)
})

# Chart the graph
p = (ggplot(data=df_filtered) +
     geom_line(aes(x='year', y='Total', color='name'), size=1.2, alpha=0.7) +
     geom_vline(xintercept=release_years, linetype="dashed", color="red") +
     geom_text(aes(x='year', y='y', label='label'),
               data=annotations,
               angle=0, vjust=-0.5, hjust=0.5, size=10) +
     ggtitle("Elliot") +
     xlab("year") +
     ylab("Total") +
     ggsize(800, 400))

# Show the graph
p.show()